Statistical models for analyzing human genetic variation

نویسنده

  • Sriram Sankararaman
چکیده

Statistical Models for Analyzing Human Genetic Variation by Sriram Sankararaman Doctor of Philosophy in Computer Science and the Designated Emphasis in Computational and Genomic Biology University of California, Berkeley Professor Michael I. Jordan, Chair Advances in sequencing and genomic technologies are providing new opportunities to understand the genetic basis of phenotypes such as diseases. Translating the large volumes of heterogeneous, often noisy, data into biological insights presents challenging problems of statistical inference. In this thesis, we focus on three important statistical problems that arise in our efforts to understand the genetic basis of phenotypic variation in humans. At the molecular level, we focus on the problem of identifying the amino acid residues in a protein that are important for its function. Identifying functional residues is essential to understanding the effect of genetic variation on protein function as well as to understanding protein function itself. We propose computational methods that predict functional residues using evolutionary information as well as from a combination of evolutionary and structural information. We demonstrate that these methods can accurately predict catalytic residues in enzymes. Case studies on well-studied enzymes show that these methods can be useful in guiding future experiments. At the population level, discovering the link between genetic and phenotypic variation requires an understanding of the genetic structure of human populations. A common form of population structure is that found in admixed groups formed by the intermixing of several ancestral populations, such as African-Americans and Latinos. We describe a Bayesian hidden Markov model of admixture and propose efficient algorithms to infer the fine-scale structure of admixed populations. We show that the fine-scale structure of these populations can be inferred even when the ancestral populations are unknown or extinct. Further, the inference algorithm can run efficiently on genome-scale datasets. This model is well-suited to estimate other parameters of biological interest such as the allele frequencies of ancestral populations which can be used, in turn, to reconstruct extinct populations. Finally, we address the problem of sharing genomic data while preserving the privacy of individual participants. We analyze the problem of detecting an individual genotype from

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Haplotype-Haplotype Interactions in Case-Control Genetic Association Studies

Haplotype analysis has been increasingly used to study the genetic basis of human diseases, but models for characterizing genetic interactions between haplotypes from different chromosomal regions have not been well developed in the current literature. In this article, we describe a statistical model for testing haplotype-haplotype interactions for human diseases with a case-control genetic ass...

متن کامل

Association of rs12913832 in the HERC2 Gene Affecting Human Iris Color Variation

Introduction: Human eye colour as a physical trait is based on the developmental biology and genetic determinants of the structure known as the iris, which is part of the uveal tract of the eye. Prediction of human visible characteristics (EVCs) by genotyping informative SNPs in DNA as biological witness opens up a new avenue in the forensic genetic. Variation of iris color rely on the amounts...

متن کامل

Predictive Ability of Statistical Genomic Prediction Methods When Underlying Genetic Architecture of Trait Is Purely Additive

A simulation study was conducted to address the issue of how purely additive (simple) genetic architecture might impact on the efficacy of parametric and non-parametric genomic prediction methods. For this purpose, we simulated a trait with narrow sense heritability h2= 0.3, with only additive genetic effects for 300 loci in order to compare the predictive ability of 14 more practically used ge...

متن کامل

Thesis proposal Learning Ancestral Genetic Processes using Nonparametric Bayesian Models

Recent explosion of genomic data have fueled the long-standing interest of analyzing genetic variations to reconstruct the evolutionary history and ancestral structures of human populations that can provide essential clues for various medical applications. Although genetic properties such as linkage disequilibrium (LD) and population structure are closely related under a common inheritance proc...

متن کامل

Genetic Variation of Seed Related Traits in Festuca arundinacea Using Multivariate Statistical Methods

Genetic diversity is the basis of breeding studies in many plant species and is one of the most important indicators for selecting parents. The aim of this experiment was to investigate the genetic diversity of tall fescue (Festuca arundinacea) using agronomic traits such as plant height, spring growth score, days to flowering, days to pollination, flag leaf length and width, panicle length, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010